Skip to content

feat: support table sample #16505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open

Conversation

chenkovsky
Copy link
Contributor

@chenkovsky chenkovsky commented Jun 23, 2025

Which issue does this PR close?

Close #16533

Rationale for this change

Currently table sample is not supported.

What changes are included in this PR?

support table sample.
it's row level.
three sample methods are supported.

  1. fixed row counts
  2. bernoulli sample
  3. poisson sample

Are these changes tested?

UT

Are there any user-facing changes?

Yes, If the user uses the match statement for logical plan, the user needs to add sample into match statement.

@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate proto Related to proto crate physical-plan Changes to the physical-plan crate labels Jun 23, 2025
@chenkovsky chenkovsky changed the title Feat/sample feat: support table sample Jun 23, 2025
@xudong963
Copy link
Member

It would be better to add more details about the PR, such as:
sample levels: block level or row level
sample ways: fixed row counts or percent?

@chenkovsky
Copy link
Contributor Author

It would be better to add more details about the PR, such as: sample levels: block level or row level sample ways: fixed row counts or percent?
@xudong963 updated

@2010YOUY01
Copy link
Contributor

I suggest to first open an issue to describe full syntax and semantics of this table sample feature, and also include the reference system (like postgres). After we have reached some agreement, then we can start implementing.

There is another implementation that seems to have several syntax difference than this PR #16325 @theirix

We had a previous discussion that DF can include features for postgres syntax. However if it's referencing other systems, then it might need more discussion and wider approval.

@chenkovsky
Copy link
Contributor Author

I suggest to first open an issue to describe full syntax and semantics of this table sample feature, and also include the reference system (like postgres). After we have reached some agreement, then we can start implementing.

There is another implementation that seems to have several syntax difference than this PR #16325 @theirix

We had a previous discussion that DF can include features for postgres syntax. However if it's referencing other systems, then it might need more discussion and wider approval.

Updated, and this PR implements Spark style sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules physical-plan Changes to the physical-plan crate proto Related to proto crate sql SQL Planner sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Support Sample
3 participants